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Abstract —The deformable and continuum nature of soft 
robots promises versatility and adaptability. Hovt^ever, control 
of modular, multi-limbed soft robots for terrestrial locomotion 
is challenging due to the complex robot structure, actuator 
mechanics and robot-environment interaction. Traditionally, 
soft robot control is performed by modeling kinematics using 
exact geometric equations and finite element analysis. 

The research presents an alternative, model-free, data-driven, 
reinforcement learning inspired approach, for controlling multi- 
limbed soft material robots. This control approach can be sum¬ 
marized as a four-step process of discretization, visualization, 
learning and optimization. The first step involves identification 
and subsequent discretization of key factors that dominate 
robot-environment, in turn, the robot control. Graph theory 
is used to visualize relationships and transitions between the 
discretized states. The graph representation facilitates mathe¬ 
matical definition of periodic control patterns (simple cycles) 
and locomotion gaits. Rewards corresponding to individual arcs 
of the graph are weighted displacement and orientation change 
for robot state-to-state transitions. These rewards are specific 
to surface of locomotion and are learned. Finally, the control 
patterns result from optimization of reward dependent loco¬ 
motion task (e.g. translation) cost function. The optimization 
problem is an Integer Linear Programming problem which can 
be quickly solved using standard solvers. 

The framework is generic and independent of type of actua¬ 
tor, soft material properties or the type of friction mechanism, 
as the control exists in the robot’s task space. Furthermore, 
the data-driven nature of the framework imparts adaptability 
to the framework toward different locomotion surfaces by re¬ 
learning rewards. 

1. Introduction 

Roboticists in recent years have been inspired by the 
ability of animals to leverage structural soft materials for 
locomotion and manipulation tasks. This has resulted in the 
development of soft material robots powered by a variety of 
actuators including bio-inspired soft [1], [2] and continuum 
manipulators [3], rigid link-based snake-like robots [4], [5], 
pneumatic soft multi-gait robots [6] and shape memory 
alloy actuated soft robots [7]-[9]. The control of flexible- 
link robots has been done using model-based or model-free 
control approaches [10]. The soft continuum manipulator 
control is traditionally performed using continuum modeling 
techniques [11]-[13], while non-continuous curvature soft 
robots have been controlled using fast flnite element methods 
[14]. Soft robots capable of terrestrial locomotion interact 
extensively with the environment and manipulate friction 
to facilitate movement. The control of such robots using 
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model-based approach will involve detailed mathematical 
descriptions [15] of the robot kinematics, dynamics, actuator 
mechanics and, most importantly, the robot-environment 
interaction. This approach is computationally intensive [16] 
and robot speciflc. In this research, we present an alternative 
model-free approach that is generic and adaptable. 

The following sections describe the mathematical basis for 
model free control as the representation of state transitions on 
a directed graph then solving the gait optimization problem. 
An experimental test case of three limbed soft robot is 
presented, followed by a discussion of the general utility 
of the approach. 

Contribution: The research presents a generic, adaptive, 
data-driven model-free control framework for locomotion 
control of multi-limbed terrestrial soft robots. The framework 
can be summarized as a four-step process of 1) discretizing 
key factors dominating robot control via robot-environment 
interaction, 2) using graph theory to visualize relationships 
between discretized robot states, 3) learning the surface- 
dependent results of state transitions and 4) optimizing 
desired cost function to obtain a control sequence. Com¬ 
putationally, the optimization problem is an Integer Linear 
Programming (ILP) problem that can be solved using stan¬ 
dard solvers. The use of graph theory facilitates mathematical 
deflnition of periodic control patterns (simple cycles) that 
form linear basis to locomotion gaits. Furthermore, the 
graph representation introduces robustness into the control 
framework by making it fault-tolerant. 

II. Model-free control framework 

Controlling locomotion by soft robots is challenging due 
to the difficulty to accurately model most robot-environment 
interactions. This interaction is easier to model in fluids 
because there are good mathematical tools for describing 
force propagation in continuous media, however, modeling 
discontinuous terrestrial interactions is much more complex. 
Furthermore, the modeling of the soft robot may be restricted 
by shape (for analytical continuum solutions) [12], [15] 
or involve simplification/discretization [16]. The complexity 
is further increased by actuator-specific modeling. As an 
example, the properties of shape memory alloy actuators 
(SMAs) change over time because heat flux cannot be 
controlled very precisely in natural settings. The model- 
free control approach takes inspiration from reinforcement 
learning [17] by focusing on goal-directed learning. This 
approach does not directly model the robot kinematics, the 
actuator or the robot-environment interaction, but indirectly 
accounts for the robot-environment interaction by observing 
the effects of changes in the robot-environment interactions. 



In the related model reduction literature, this approach is 
often called the input-output approach [18]. Usually the 
input-output approach is preferred when the full dynamics of 
the system are complicated, and the input action is relatively 
limited. 

In this work, the robot locomotion is formulated as a class 
of optimization problems on directed graphs. An analogy 
between the language of graph theory and soft robot loco¬ 
motion is constructed. This analogy is constructed alongside 
an example of a robot to facilitate better understanding of 
the framework. 

Example robot: The example robot is a monolithic 3- 
D printed soft robot with a soft deformable body and two 
gripper-like (friction manipulation) mechanisms at each end 
of the robot as illustrated in Figure 


A. Discretization of robot-environment interaction 

Locomotion results from manipulation and optimization 
of friction forces at different parts of a body [19] . This may 
be performed using directional friction or with a mechanical 
or chemical mechanism [4], [20]. The body-environment 
interaction can be discretized into small number of finite 
behaviors. 

Definition: Behavior, denoted by B, are discrete behaviors 
of a system part e.g. for a robot sub-system - grip on/off 
{Bq), directional friction (Bdf)- 


Bg 

Bdf 


j 0 for grip on 

( 1 for grip off 

J 0 movement in preferred direction 

1 1 movement in opposite direction 


Typically, an actuator independently controls the behavior 
of a robot sub-system, but, this behavior representation of 
robot-environment interaction is independent of the type of 
actuator. 

Definition: State, denoted by S, consists of the correspond¬ 
ing behaviors of all robot sub-systems. The total number of 
states n for a robot with m sub-system robot parts, each 
having h discrete behaviors can be defined as 


n = h^ 


(3) 



Fig. 1. Example soft robot comprises of soft, deformable body and gripper¬ 
like friction manipulation mechanisms at both ends (front and rear) of the 
robot. The actuators (gold channel starting from yellow circle and ending 
at red circle) independently control each friction manipulation mechanism. 



Fig. 2. The directed graph corresponding to the example robot comprising 
of two system parts (m = 2 ) such that each part has two discretized 
behaviors (P = 2). The n = A nodes are Ni, N 2 , N 3 , N 4 . They 
correspond to the four robot states { 00 }, { 01 }, { 10 }, { 11 }. 


The arcs represent the transition from one robot state to 
another. This robot state transition, equivalent of change in 
robot-environment interaction, will result in some translation 
(Ax, /Sy) and rotation (A6>) on a plane. The weighted result 
is called the state transition reward. The state transition 
reward vector Ri G corresponding to vector arc weight 

of arc Ai is written as 

Ri = [wix,Wiy, Wi0y for i = (4) 


The example soft robot has two gripper-like friction manip¬ 
ulation mechanisms (m = 2 sub-systems) that have binary 
behaviors (b = 2) as evident from Equation As a result, 
the total number of states are n = 4 and can be exhaustively 
written as {(00), (01), (10), (11)}. 

B. Visualization, Learning and Graph Theory 

Definition: Each graph node represents one robot state. 
The nodes are denoted by Ni for i = 1, • • • n. 

Each graph [21] directed arc is a connection between two 
different nodes. The total number of arcs for the case where 
the robot can transition from any node to another (fully 
connected graph) are P = n - (n — 1). Each arc is identified as 
A/c for /c = 1,2, • • •, P. These nodes connected by directed 
arcs comprise of a directed graph. The directed graph for the 
example soft robot with P = 12 arcs is shown in Eigurej^ 


The full reward matrix for the graph is 

R = [RiR2---Rp], (5) 

Learning. The state transition rewards are experimentally 
determined and change with the surface of contact as they at¬ 
tempt to indirectly model the robot-environment interaction. 
This learning ability imparts adaptability to the framework by 
facilitating compensation (learning) for unexpected changes 
in the environment. 

Definition: Simple cycle. A closed walk consists of a 
sequence of nodes starting and ending at the same node, 
with each two consecutive nodes in the sequence connected 
by a directed arc. A simple cycle is a closed walk with no 
repetitions of nodes and directed arcs allowed, other than 
the repetition of the starting and the ending node. Simple 
cycles may also be described by their sets of directed arcs. 
































unlike closed walks for which the multi-set of arcs does not 
unambiguously determine the node ordering. A simple cycle 
Ci G VJ" for i = 1, • • •, if where K is the total number of 
cycles and 


Q.'? — 


1 if cy includes arc A.- 


0 


otherwise 


( 6 ) 


in Figure Hence, the 


'eg 


^eg,j — 


G is written as 

1 for j = 1,6,7,11 
0 otherwise 


( 8 ) 


Definition: A circulation is a linear integer combination 
of simple cycles 


K 




XiCi 


€{ 0 , 1 , 2 ,..} 


(9) 


i=l 


We define locomotion gait as being equivalent to a circula¬ 
tion. It is important to note that this notation does not define 
the order of simple cycles, rather, only the combination. The 
reward for the circulation can be similarly written as 


K 


JiL) = J2xiJi, 


€{ 0 , 1 , 2 ,..} 


( 10 ) 


i=l 


The non-ambiguous representation of locomotion gaits using 
simple cycles, which form the linear basis, is very important 
as it allows for a simple formulation of the optimization 
problem. 

C. Optimization and Finding Gaits 

The locomotion gaits of robots optimize some cost func¬ 
tion e.g. maximize translation or rotation for a given number 
of steps. For such analysis, the reward for a circulation L is 
decomposed into three components, corresponding to x^y^O 
components 


j{L) = [r,jy,X] 


9]T 


( 11 ) 


Consider the problem of finding a locomotion gait which 
maximizes the translation in direction. We impose 
a constraint on the maximum number of state transitions 



Given the graph structure (i.e. set of nodes, and directed 
arcs), the problem of finding all simple cycles can be solved 
efficiently [22]. We use the open source software NetworkX 
[23] to obtain the solution of this problem. While the graph 
is fully connected, many of the state transitions do not lead 
to any significant locomotion. Hence, the directed arcs with 
all rewards below a certain threshold can be removed from 
the graph structure before doing the gait computation. 

The simple cycles are periodic cycles of state transitions 
and act as linear basis for finding locomotion gaits (circu¬ 
lation). The reward vector associated with every individual 
simple cycle is referred to as the simple cycle reward Ji G 

Ji = Rci (7) 

For the example soft robot graph structure, a simple cycle 
Ceg of N 2 Ns N 4 ^ Ni} will comprise of 

arcs Ai, A 7 ,Aii, Ae corresponding to the numbering given 
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Fig. 3. Analogy between directed graph and robot mechanics for the 
model-free control framework 


and permitted residual 


/ K p 

allowed len{L) = ^ 

\ i=ij=i ) 

translation in NY direction (J^) and rotation in +6> direction 
{Jq). Hence, the optimization problem is written as 


max 


with constraints 


jy € [-ey-,ey+\, r € [-eo_,eo+], len{L) < l„ 


( 12 ) 


(13) 


This problem is an integer linear programming (ILP) problem 
and can be solved for small to medium size graphs using one 
of the many standard solvers such as Matlab optimization 
toolbox [24] and Gurobi [25]. 

This analogy of soft robot locomotion with graph theory 
(Figure is very advantageous - 1) The graph representation 
of discretized robot behaviors facilitates easy visualization, 
2) the unique mathematical representation of simple cy¬ 
cle vector (Equation is instrumental in defining the 3) 
circulation-locomotion gait analogy (Equation [^. 4) The 
optimization problem is an ILP problem (Equations [T^ 
which can be quickly solved using standard linear solvers 
for small to medium size graphs. 

Speed. The periodic control sequences (simple cycles) are 
independent of actuation and material variations. Hence, the 
speed of locomotion is solely dependent on speed at which a 
robot can transition from one state to another. As an example, 
given two same soft robots i?i, R 2 actuated by actuators 
(e.g. motors) Mi, M 2 with power Pi, P 2 such that P 2 > Pi. 
Here, the actuator M 2 facilitates faster transition from one 
state to another, therefore, P 2 is capable of faster locomotion 
than Pi with the same control sequence. Similar argument 
also holds for robots designed using two different materials 
one having faster rate of deformation than the other. 

D. Extensions to other locomotion tasks 


The framework described in section II-C can be gener¬ 
alized to include more complicated gaits and locomotion 
objectives. In our formulation so far, we have assumed 
that the reward matrices remain effectively independent of 
the robot coordinate system. This assumption holds only 









approximately for the case when 0 is small, but in case of 
larger 0 displacement, the displacement in x and y direction 
is multiplied by a rotation matrix. The resulting locomo¬ 
tion optimization problem essentially becomes a nonlinear 
integer programming problem. The resulting gaits can have 
arbitrarily large angular displacement, and can lead to highly 
complex gaits. Furthermore, this framework could also be 
adapted to have the robot follow a given curved path. This 
extension will be the subject of future work. 

The robustness and fault-tolerance of the framework is 
portrayed by its ability to efficiently respond to scenar¬ 
ios such as the loss of a limb. In this case, one of the 
actuators/sub-systems of the robot becomes inoperable. A 
runtime modification of the graph structure (by removing the 
corresponding nodes and arcs), followed by re-computation 
of the optimal gaits can handle this situation. 

Furthermore, given the data on a certain class of actuators, 
the length constraint of maximum number of state transitions 
in equation can be replaced by a weighted constraint i.e. 

K p 

tmax^ where T G 'RP is vector 

i=l 3 = 1 

containing time taken for each state transition, and tmax is 
the maximum allowable gait time. 

III. Experiment 

The experimental soft robot is similar to the example soft 
robot, but has three limbs with gripper-like two state friction 
mechanisms at end of each limb as shown in Figure The 
soft robot has a soft deformable body made of rubber-like 
TangloPlus™ and is printed on Connex 500™ multi-material 
3D printer. Each of the friction mechanism uses soft rubber¬ 
like TangoPlus™as the sticky material and hard abs-like 
VeroClear^^ as the slippery material. 

These gripper-like mechanisms are independently con¬ 
trolled using NiTi SMA actuators (Toki Corporation @). 
The SMA coils are electrically activated to shorten by 
joule heating and they relax to the original shape using 
the stored elastic energy upon deactivation. SMA activation 
change both the limb shape and its friction state as the 
contact angle between the limb and the surface varies about 
the critical contact angle ip* (Figure [^. The properties of 
SMA coils may vary over time due to inconsistent cooling, 
etc. but the discretization of sub-system behavior (friction 
mechanism) makes the control sequence independent of the 
precise condition of the actuator, and merely dependent on 
the contact angle. This soft robot has three (m = 3) sub¬ 
systems each having two discrete behaviors (b = 2), thus, 
the total number of states are n = 2^ (visual representation 
- Appendix]^ such that the node Ni and robot state analogy 
can be expressed as 


Ni dec2bin(i — 1) i = 1, 2 ..., 8 


(14) 



where dec2bin function converts decimal to binary format. 
This fully connected directed graph comprises of P = 
8 • (8 — 1) = 56 arcs. The experiment is run on a smooth 
planar surface (table-top) and the state transition rewards 
(arc weights) are recorded as weighted mean of the relative 


Fig. 4. Three limb robot with deformable soft body and a gripper-like 
friction mechanism at end of each limb. The gripper-like friction mechanism 
is made using two materials with different coefficients of friction - a 
sticky/soft and slippery/hard material. The material of contact changes with 
the shape of the robot and can exist in two discrete behaviors - 0 and 1 such 
that the switch happens about the critical contact angle b*- The behavior of 
these friction mechanisms is independently controlled (via the limb shape) 
using three embedded shape memory alloy (SMA) actuators. 


change in position and orientation for 10 state transition rep¬ 
etitions (Appendix [I^. The weighted state transition rewards 
are dependent on the robot and the surface of contact. 

The solution to the optimization problem for a given 
maximum length (Imax) and residual motion (e±y,e±o) are 
integers (xi) corresponding to the number of simple cycles 
(Ci) in the gait. Here, we analyze the different simple cycles 
obtained from the optimization. 

Translation in direction: Translation in +A is the 
solution of the optimization problem stated in Equations [T^ 
13 The simple cycle control sequences for Imax = 15 with 
tolerances e±y = l^e±e = 5 result in the sequences shown 
in Figure The use of different simple cycles to obtain 
same goal (-fA translation) is an important result as shown 
in Figure 

Translation in —X direction: Calculation of the simple 
cycle control sequences for optimal —X direction translation 
can be done by converting the maximization problem to 
minimization problem. Two simple cycles resulting from 
modified optimization problem with same constraints are 
shown in Figure 

Fault tolerance: A loss of limb scenario is illustrated when 
the second actuator/sub-system of the three-limb robot be¬ 
comes inoperable. Consequently, the robot cannot transition 
into or out of states (010), (Oil), (110), (111) (Appendix 
1^. The graph structure is modified by isolating the nodes 
corresponding to these states N 3 ,N 4 ,N 7 ,N 8 as shown in 
Figure [7] The optimization can be applied to the modified 
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(C) 

Fig. 5. Three different state control sequences (simple cycles) resulting 
from optimization that produce forward translation. 
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Fig. 6. Two different simple cycle control sequences resulting from 
optimization that produce backward translation {—X displacement) of the 
soft robot. 

graph to calculate desired control sequences without re¬ 
learning the state transition rewards. The optimized control 
sequences for this graph resulting in -fX, —X translation 
are illustrated in Figures and respectively. 

The supplemental video illustrates the soft robot executing 
simple cycles to translate in forward and backward directions 
for both normal and limb loss scenario. 

IV. Conclusion 

The research presents a data-driven, reinforcement learn¬ 
ing inspired model-free control framework that indirectly 
models the robot-environment interaction and is summa¬ 
rized as a four-step process of discretization, visualization, 
learning and optimization. The dominant factors of robot- 
environment interaction are discretized into a finite number 
of behaviors. In this case, these behaviors correspond to one 
of two friction conditions and the combination of multiple 
limb behaviors define robot states. This discretization also 
allows the framework to be generic enough to be adapt¬ 
able to a variety of different materials and actuator types. 
The framework utilizes graph theory language to describe 
control of soft robot. The finite number of robot states are 
represented by the nodes of the directed graph. Similarly, 
the transitions between states are represented by the directed 
arcs whereas the arc weights correspond to the result of 
the robot transitioning from one state to another. This state 



Fig. 7. Fault-tolerance ability with loss of limb scenario. The limb 
2 becomes inoperable, thus, not allowing transition of robots into states 
corresponding to nodes N 3 ,N 4 ,N 7 ,N 8 . The nodes are isolated and 
optimization can be performed to obtain desired control sequences without 
re-learning state transition rewards. 
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(000) (001) (101) (000) 


(b) 

Fig. 8. Two control sequences of same cost resulting from optimization 
of the modified graph for forward translation (-\-X displacement). The red 
color corresponds to the inoperable limb. 

transition reward is dependent on the type of contact and 
needs to be learned for locomotion on different surfaces. This 
fiexibility to learn the state transition rewards can facilitate 
adaptation to unexpected changes in the environment. The 
use of graph theory facilitates mathematical definition of 
periodic motions as simple cycles. These simple cycles allow 
formulation of an Integer Linear Programming problem that 
can be solved quickly using standard solvers. Furthermore, 
the graph representation imparts fault tolerance ability to 
the robot e.g. in case of a loss of limb scenario, graph 
nodes are isolated and new control sequences are calculated 
by performing optimization on the modified graph without 
needing to re-leam state transition rewards. A three limbed 
soft robot is controlled using the presented framework. The 
state transition rewards are visually recorded and multiple 
simple cycles are obtained for translation in forward and 
backward directions. This framework can be extended to 
produce more complex gaits, including highly nonlinear 
ones. 
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Appendix 1 

Visual representation of robot states 

The crossed marked limb indicates state 1 or activated 
actuator > t/;*), while the unmarked limb indicates state 
0 or relaxed actuator (pj < ?/;*). 
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Appendix 11 

State transition rewards for three-limbed 

EXPERIMENTAL ROBOT 


Transition 

Rewards (J?f ) 

Transition 

REWARDS(J?f ) 

1 ^ 2 

[0,0,0] 

5 ^ 1 

[0,0,0] 

1 ^ 3 

[0,0,0] 

5^2 

[-3,-1,-1] 

1 ^ 4 

[0,0,0] 

5^3 

[-2,-1,5] 

1 ^ 5 

[0,0,0] 

5^4 

[-3,-1.5,-15] 

1 ^ 6 

[0,0,5] 

5^6 

[-4,-1,-2] 

1 ^ 7 

[0,0,-5] 

5^7 

[-4,0,2] 

1 ^ 8 

[1,0.5,0] 

5^8 

[-1,0,0] 

2 ^ 1 

[0,0,0] 

6 ^ 1 

[0,0,0] 

2^3 

[0,0,0] 

6^2 

[-2,-0.5, 2] 

2^4 

[0,-0.5,-10] 

6^3 

[-1,0.5,10] 

2^5 

[1,0.5,-10] 

6^4 

[-3,-3,-15] 

2^6 

[1,0,-1] 

6^5 

[0,0,0] 

2^7 

[2,1,-2] 

6^7 

[0.5,0,2] 

2^8 

[3.5,0.5,0] 

6^8 

[3,-1.5,0] 

3 ^ 1 

[0,0,0] 

7^ 1 

[0,0,0] 

3^2 

[0,0,0] 

7^2 

[-2,0,0] 

3^4 

[0,0.5,15] 

7^3 

[-2,0.5,-2] 

3^5 

[2,-0.5, 30] 

7^4 

[-0.5,0.25,0] 

3^6 

[1,0,10] 

7^5 

[-0.5,-0.25,0] 

3^7 

[1,0,1] 

7^6 

[-1,0,0] 

3^8 

[2,-0.75, 7.5] 

7^8 

[3,1.5,0] 

4 ^ 1 

[0,0,0] 

8 ^ 1 

[1, 0.5,0] 

4^2 

[0,-0.5,-10] 

8^2 

[-2.5,-2.5,-4] 

4^3 

[0,-0.5,-10] 

8^3 

[0.5,0.5,2] 

4^5 

[4,0.5,15] 

8^4 

[-i,-i,o] 

4^6 

[3,0,-6] 

8^5 

[5,0,0] 

4^7 

[2,0.5,0] 

8^6 

[3,-1.5,0] 

4^8 

[3, -0.5,0] 

8^7 

[1, 0.5,0] 
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